AITopics | Aswan Governorate

Collaborating Authors

Aswan Governorate

Language Model Tokenizers Introduce Unfairness Between Languages

Neural Information Processing SystemsFeb-14-2026, 13:54:12 GMT

Recent language models have shown impressive multilingual performance, even when not explicitly trained for it. Despite this, there are concerns about the quality of their outputs across different languages. In this paper, we show how disparity in the treatment of different languages arises at the tokenization stage, well before a model is even invoked. The same text translated into different languages can have drastically different tok-enization lengths, with differences up to 15 times in some cases. These disparities persist even for tokenizers that are intentionally trained for multilingual support.

large language model, machine learning, natural language, (18 more...)

Neural Information Processing Systems

Country:

North America > Haiti (0.14)
Asia > Philippines > Luzon > Ilocos Region > Province of Pangasinan (0.04)
Europe > Switzerland > Zürich > Zürich (0.04)
(38 more...)

Genre: Overview (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.70)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.68)

Add feedback

The Longest Solar Eclipse for 100 Years Is Coming. Don't Miss It

WIREDDec-8-2025, 10:00:00 GMT

The Longest Solar Eclipse for 100 Years Is Coming. NASA has announced when the longest total solar eclipse of the century will occur--and you won't have to wait long. Here's what you should know. The duration of a total solar eclipse always varies. In April 2024, the eclipse that crossed North America lasted 4 minutes and 28 seconds.

artificial intelligence, eclipse, solar eclipse, (15 more...)

WIRED

Country:

Asia > Nepal (0.15)
Europe > Spain (0.06)
North America > United States > California (0.05)
(12 more...)

Industry:

Government > Space Agency (0.59)
Government > Regional Government > North America Government > United States Government (0.59)

Technology: Information Technology > Artificial Intelligence (0.30)

Add feedback

Multi-Objective Reinforcement Learning for Water Management

Osika, Zuzanna, Rădulescu, Roxana, Salazar, Jazmin Zatarain, Oliehoek, Frans, Murukannaiah, Pradeep K.

arXiv.org Artificial IntelligenceNov-24-2025

Many real-world problems (e.g., resource management, autonomous driving, drug discovery) require optimizing multiple, conflicting objectives. Multi-objective reinforcement learning (MORL) extends classic reinforcement learning to handle multiple objectives simultaneously, yielding a set of policies that capture various trade-offs. However, the MORL field lacks complex, realistic environments and benchmarks. We introduce a water resource (Nile river basin) management case study and model it as a MORL environment. We then benchmark existing MORL algorithms on this task. Our results show that specialized water management methods outperform state-of-the-art MORL approaches, underscoring the scalability challenges MORL algorithms face in real-world scenarios.

artificial intelligence, machine learning, multi-objective reinforcement learning, (11 more...)

arXiv.org Artificial Intelligence

2505.01094

Country:

Europe > Netherlands > South Holland > Delft (0.07)
Africa > Sudan (0.06)
Africa > Ethiopia (0.06)
(8 more...)

Genre: Research Report > New Finding (0.55)

Industry:

Energy > Power Industry (1.00)
Energy > Renewable > Hydroelectric (0.70)
Water & Waste Management > Water Management > Water Supplies & Services (0.47)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

74bb24dca8334adce292883b4b651eda-Paper-Conference.pdf

Neural Information Processing SystemsOct-8-2025, 22:13:22 GMT

large language model, machine learning, natural language, (17 more...)

Neural Information Processing Systems

Country:

North America > Haiti (0.14)
Asia > Philippines > Luzon > Ilocos Region > Province of Pangasinan (0.04)
Europe > Switzerland > Zürich > Zürich (0.04)
(38 more...)

Genre: Overview (0.46)

Technology:

Information Technology > Communications (0.93)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.70)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.69)
(2 more...)

Add feedback

A Survey of Pun Generation: Datasets, Evaluations and Methodologies

Su, Yuchen, Zhu, Yonghua, Wang, Ruofan, Huang, Zijian, Benavides-Prado, Diana, Witbrock, Michael

arXiv.org Artificial IntelligenceOct-6-2025

Pun generation seeks to creatively modify linguistic elements in text to produce humour or evoke double meanings. It also aims to preserve coherence and contextual appropriateness, making it useful in creative writing and entertainment across various media and contexts. Although pun generation has received considerable attention in computational linguistics, there is currently no dedicated survey that systematically reviews this specific area. To bridge this gap, this paper provides a comprehensive review of pun generation datasets and methods across different stages, including conventional approaches, deep learning techniques, and pre-trained language models. Additionally, we summarise both automated and human evaluation metrics used to assess the quality of pun generation. Finally, we discuss the research challenges and propose promising directions for future work.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2507.04793

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Oceania > New Zealand > North Island > Auckland Region > Auckland (0.04)
(8 more...)

Genre:

Research Report (1.00)
Overview (1.00)

Industry: Education (0.92)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.46)

Add feedback

Personalized Reasoning: Just-In-Time Personalization and Why LLMs Fail At It

Li, Shuyue Stella, Bose, Avinandan, Brahman, Faeze, Du, Simon Shaolei, Koh, Pang Wei, Fazel, Maryam, Tsvetkov, Yulia

arXiv.org Artificial IntelligenceOct-2-2025

Current large language model (LLM) development treats task-solving and preference alignment as separate challenges, optimizing first for objective correctness, then for alignment to aggregated human preferences. This paradigm fails in human-facing applications where solving a problem correctly is insufficient if the response mismatches the user's needs. This challenge intensifies in just-in-time scenarios where no prior user interaction history exists due to cold-start conditions or privacy constraints. LLMs need to identify what they don't know about user preferences, strategically elicit preference values through questioning, then adapt their reasoning processes and responses accordingly -- a complicated chain of cognitive processes which we term personalized reasoning. We introduce PREFDISCO, an evaluation methodology that transforms static benchmarks into interactive personalization tasks using psychologically-grounded personas with sparse preferences. Our framework creates scenarios where identical questions require different reasoning chains depending on user context, as optimal explanation approaches vary by individual expertise and preferences while maintaining factual accuracy. Evaluation of 21 frontier models across 10 tasks reveals 29.0% of naive personalization attempts produce worse preference alignment than generic responses, yet generic responses also fail to serve individual user needs effectively. These findings suggest personalized reasoning requires dedicated development rather than emerging naturally. PREFDISCO establishes personalized reasoning as a measurable research frontier and reveals fundamental limitations in current LLMs' interactive capabilities, providing a foundation for developing systems that can adapt to individual users in education, healthcare, and technical domains where personalization is critical.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2510.00177

Country:

Asia > Middle East > Jordan (0.05)
Europe > Sweden > Stockholm > Stockholm (0.04)
Asia > Japan > Honshū > Kansai > Osaka Prefecture > Osaka (0.04)
(6 more...)

Genre: Research Report > New Finding (0.65)

Industry:

Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (1.00)
Education (1.00)
Health & Medicine > Therapeutic Area > Obstetrics/Gynecology (0.93)
Health & Medicine > Diagnostic Medicine (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Real-Time Fish Detection in Indonesian Marine Ecosystems Using Lightweight YOLOv10-nano Architecture

Wuntu, Jonathan, Putro, Muhamad Dwisnanto, Syahputra, Rendy

arXiv.org Artificial IntelligenceSep-23-2025

Indonesia's marine ecosystems, part of the globally recognized Coral Triangle, are among the richest in biodiversity, requiring efficient monitoring tools to support conservation. Traditional fish detection methods are time-consuming and demand expert knowledge, prompting the need for automated solutions. This study explores the implementation of YOLOv10-nano, a state-of-the-art deep learning model, for real-time marine fish detection in Indonesian waters, using test data from Bunaken National Marine Park. YOLOv10's architecture, featuring improvements like the CSPNet backbone, PAN for feature fusion, and Pyramid Spatial Attention Block, enables efficient and accurate object detection even in complex environments. The model was evaluated on the DeepFish and OpenImages V7-Fish datasets. Results show that YOLOv10-nano achieves a high detection accuracy with mAP50 of 0.966 and mAP50:95 of 0.606 while maintaining low computational demand (2.7M parameters, 8.4 GFLOPs). It also delivered an average inference speed of 29.29 FPS on the CPU, making it suitable for real-time deployment. Although OpenImages V7-Fish alone provided lower accuracy, it complemented DeepFish in enhancing model robustness. Overall, this study demonstrates YOLOv10-nano's potential for efficient, scalable marine fish monitoring and conservation applications in data-limited environments.

detection, machine learning, real time system, (17 more...)

arXiv.org Artificial Intelligence

2509.17406

Country:

Asia > Indonesia (0.25)
Oceania > Australia (0.04)
Asia > Nepal (0.04)
Africa > Middle East > Egypt > Aswan Governorate > Aswan (0.04)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Architecture > Real Time Systems (1.00)

Add feedback

Predicting Traffic Accident Severity with Deep Neural Networks

Bibb, Meghan, Rivas, Pablo, Tayba, Mahee

arXiv.org Artificial IntelligenceSep-5-2025

Traffic accidents can be studied to mitigate the risk of further events. Recent advances in machine learning have provided an alternative way to study data associated with traffic accidents. New models achieve good generalization and high predictive power over imbalanced data. In this research, we study neural network-based models on data related to traffic accidents. We begin analyzing relative feature colinearity and unsupervised dimensionality reduction through autoencoders, followed by a dense network. The features are related to traffic accident data and the target is to classify accident severity. Our experiments show cross-validated results of up to 92% accuracy when classifying accident severity using the proposed deep neural network.

accident, artificial intelligence, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2509.03819

Country:

North America > United States > Texas (0.05)
North America > United States > Ohio (0.04)
Europe > Spain (0.04)
(5 more...)

Genre: Research Report (1.00)

Industry:

Government (0.47)
Transportation > Infrastructure & Services (0.47)
Health & Medicine (0.47)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

When Alignment Hurts: Decoupling Representational Spaces in Multilingual Models

Elshabrawy, Ahmed, Kaing, Hour, Song, Haiyue, Aji, Alham Fikri, Tanaka, Hideki, Utiyama, Masao, Dabre, Raj

arXiv.org Artificial IntelligenceAug-19-2025

Alignment with high-resource standard languages is often assumed to aid the modeling of related low-resource varieties. We challenge this assumption by demonstrating that excessive representational entanglement with a dominant variety, such as Modern Standard Arabic (MSA) in relation to Arabic dialects, can actively hinder generative modeling. We present the first comprehensive causal study of this phenomenon by analyzing and directly intervening in the internal representation geometry of large language models (LLMs). Our key contribution is an online variational probing framework that continuously estimates the subspace of the standard variety during fine-tuning, enabling projection-based decoupling from this space. While our study uses Arabic as a case due to its unusually rich parallel resources across 25 dialects, the broader motivation is methodological: dialectal MT serves as a controlled proxy for generative tasks where comparable multi-variety corpora are unavailable. Across 25 dialects, our intervention improves generation quality by up to +4.9 chrF++ and +2.0 on average compared to standard fine-tuning, despite a measured tradeoff in standard-language performance. These results provide causal evidence that subspace dominance by high-resource varieties can restrict generative capacity for related varieties. More generally, we unify geometric and information-theoretic probing with subspace-level causal interventions, offering practical tools for improving generative modeling in closely related language families and, more broadly, for controlling representational allocation in multilingual and multi-domain LLMs.

artificial intelligence, large language model, natural language, (20 more...)

arXiv.org Artificial Intelligence

2508.12803

Country:

Asia > Singapore (0.05)
Asia > Indonesia > Bali (0.05)
Africa > Middle East > Egypt > Cairo Governorate > Cairo (0.05)
(25 more...)

Genre:

Research Report > Experimental Study (0.46)
Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

A Appendix

Neural Information Processing SystemsAug-14-2025, 22:37:16 GMT

It suggests that, for any m { k,...,n 1 } and z R, L A.2 Proofs for Lemma 2 and 3 for the case when K is unknown in 4 Lemma 2 . It suggests that, for any m { 0,...,n 1 } and z R, L For any m { 0,...,n 1 } and z R, we have L A.3 Additional tricks for methods proposed in 3. Finding optimal CP vector when z = in paraCP(n,k, ˆ T Additional pruning condition for parametric DP when K is fixed. In 3.3, we showed that Lemma 4. F orn [ N ], and k [ K ], let T Therefore, it fails to control the false positive rate. This is asymptotic test for multiple detected CPs. Fused Lasso (proposed by the same authors), is worse than BinSeg-SI. BinSeg-SI had been considered as a computationally efficient approximation of the problem in (7), where the authors additionally condition on extra information for computational tractability, e.g., the order that CPs are detected.

cp vector, optseg-si method, optseg-si-oc, (15 more...)

Neural Information Processing Systems

Country: Africa > Middle East > Egypt > Aswan Governorate > Aswan (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.89)

Add feedback